A Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness

نویسندگان

  • Archna Bhatia
  • Mandy Simons
  • Lori S. Levin
  • Yulia Tsvetkov
  • Chris Dyer
  • Jordan Bender
چکیده

We present a definiteness annotation scheme that captures the semantic, pragmatic, and discourse information associated with noun phrases, which we call communicative functions. A survey of the linguistics literature suggests that definiteness does not express a single communicative function but is a grammaticalization of many such functions, for example, identifiability, familiarity, uniqueness, and specificity. Our annotation scheme unifies ideas from previous research on definiteness while attempting to remove redundancy. The scheme encodes the communicative functions of definiteness rather than the grammatical forms of definiteness. We assume that the communicative functions are largely maintained across languages while the grammaticalization of this information may vary. Corpora that are annotated using communicative functions can be used to train classifiers, offering data-driven insights into the grammaticalization of definiteness in different languages. We release our annotated corpora for English and Hindi as well as sample annotations for Hebrew and Russian, together with an annotation manual.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Automatic Classification of Communicative Functions of Definiteness

Definiteness expresses a constellation of semantic, pragmatic, and discourse properties—the communicative functions—of an NP. We present a supervised classifier for English NPs that uses lexical, morphological, and syntactic features to predict an NP’s communicative function in terms of a language-universal classification scheme. Our classifiers establish strong baselines for future work in thi...

متن کامل

A Unified Representation For Morphological, Syntactic, Semantic, And Referential Annotations

This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper d...

متن کامل

A Recursive Annotation Scheme for Referential Information Status

We provide a robust and detailed annotation scheme for information status, which is easy to use, follows a semantic rather than cognitive motivation, and achieves reasonable inter-annotator scores. Our annotation scheme is based on two main assumptions: firstly, that information status strongly depends on (in)definiteness, and secondly, that it ought to be understood as a property of referents ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014